{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# NEURAL NETWORKS\n",
    "\n",
    "This notebook covers the neural network algorithms from chapter 18 of the book *Artificial Intelligence: A Modern Approach*, by Stuart Russel and Peter Norvig. The code in the notebook can be found in [learning.py](https://github.com/aimacode/aima-python/blob/master/learning.py).\n",
    "\n",
    "Execute the below cell to get started:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from learning import *\n",
    "\n",
    "from notebook import psource, pseudocode"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## NEURAL NETWORK ALGORITHM\n",
    "\n",
    "### Overview\n",
    "\n",
    "Although the Perceptron may seem like a good way to make classifications, it is a linear classifier (which, roughly, means it can only draw straight lines to divide spaces) and therefore it can be stumped by more complex problems. To solve this issue we can extend Perceptron by employing multiple layers of its functionality. The construct we are left with is called a Neural Network, or a Multi-Layer Perceptron, and it is a non-linear classifier. It achieves that by combining the results of linear functions on each layer of the network.\n",
    "\n",
    "Similar to the Perceptron, this network also has an input and output layer; however, it can also have a number of hidden layers. These hidden layers are responsible for the non-linearity of the network. The layers are comprised of nodes. Each node in a layer (excluding the input one), holds some values, called *weights*, and takes as input the output values of the previous layer. The node then calculates the dot product of its inputs and its weights and then activates it with an *activation function* (e.g. sigmoid activation function). Its output is then fed to the nodes of the next layer. Note that sometimes the output layer does not use an activation function, or uses a different one from the rest of the network. The process of passing the outputs down the layer is called *feed-forward*.\n",
    "\n",
    "After the input values are fed-forward into the network, the resulting output can be used for classification. The problem at hand now is how to train the network (i.e. adjust the weights in the nodes). To accomplish that we utilize the *Backpropagation* algorithm. In short, it does the opposite of what we were doing up to this point. Instead of feeding the input forward, it will track the error backwards. So, after we make a classification, we check whether it is correct or not, and how far off we were. We then take this error and propagate it backwards in the network, adjusting the weights of the nodes accordingly. We will run the algorithm on the given input/dataset for a fixed amount of time, or until we are satisfied with the results. The number of times we will iterate over the dataset is called *epochs*. In a later section we take a detailed look at how this algorithm works.\n",
    "\n",
    "NOTE: Sometimes we add another node to the input of each layer, called *bias*. This is a constant value that will be fed to the next layer, usually set to 1. The bias generally helps us \"shift\" the computed function to the left or right."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![neural_net](images/neural_net.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Implementation\n",
    "\n",
    "The `NeuralNetLearner` function takes as input a dataset to train upon, the learning rate (in (0, 1]), the number of epochs and finally the size of the hidden layers. This last argument is a list, with each element corresponding to one hidden layer.\n",
    "\n",
    "After that we will create our neural network in the `network` function. This function will make the necessary connections between the input layer, hidden layer and output layer. With the network ready, we will use the `BackPropagationLearner` to train the weights of our network for the examples provided in the dataset.\n",
    "\n",
    "The NeuralNetLearner returns the `predict` function which, in short, can receive an example and feed-forward it into our network to generate a prediction.\n",
    "\n",
    "In more detail, the example values are first passed to the input layer and then they are passed through the rest of the layers. Each node calculates the dot product of its inputs and its weights, activates it and pushes it to the next layer. The final prediction is the node in the output layer with the maximum value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\"\n",
       "   \"http://www.w3.org/TR/html4/strict.dtd\">\n",
       "\n",
       "<html>\n",
       "<head>\n",
       "  <title></title>\n",
       "  <meta http-equiv=\"content-type\" content=\"text/html; charset=None\">\n",
       "  <style type=\"text/css\">\n",
       "td.linenos { background-color: #f0f0f0; padding-right: 10px; }\n",
       "span.lineno { background-color: #f0f0f0; padding: 0 5px 0 5px; }\n",
       "pre { line-height: 125%; }\n",
       "body .hll { background-color: #ffffcc }\n",
       "body  { background: #f8f8f8; }\n",
       "body .c { color: #408080; font-style: italic } /* Comment */\n",
       "body .err { border: 1px solid #FF0000 } /* Error */\n",
       "body .k { color: #008000; font-weight: bold } /* Keyword */\n",
       "body .o { color: #666666 } /* Operator */\n",
       "body .ch { color: #408080; font-style: italic } /* Comment.Hashbang */\n",
       "body .cm { color: #408080; font-style: italic } /* Comment.Multiline */\n",
       "body .cp { color: #BC7A00 } /* Comment.Preproc */\n",
       "body .cpf { color: #408080; font-style: italic } /* Comment.PreprocFile */\n",
       "body .c1 { color: #408080; font-style: italic } /* Comment.Single */\n",
       "body .cs { color: #408080; font-style: italic } /* Comment.Special */\n",
       "body .gd { color: #A00000 } /* Generic.Deleted */\n",
       "body .ge { font-style: italic } /* Generic.Emph */\n",
       "body .gr { color: #FF0000 } /* Generic.Error */\n",
       "body .gh { color: #000080; font-weight: bold } /* Generic.Heading */\n",
       "body .gi { color: #00A000 } /* Generic.Inserted */\n",
       "body .go { color: #888888 } /* Generic.Output */\n",
       "body .gp { color: #000080; font-weight: bold } /* Generic.Prompt */\n",
       "body .gs { font-weight: bold } /* Generic.Strong */\n",
       "body .gu { color: #800080; font-weight: bold } /* Generic.Subheading */\n",
       "body .gt { color: #0044DD } /* Generic.Traceback */\n",
       "body .kc { color: #008000; font-weight: bold } /* Keyword.Constant */\n",
       "body .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */\n",
       "body .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */\n",
       "body .kp { color: #008000 } /* Keyword.Pseudo */\n",
       "body .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */\n",
       "body .kt { color: #B00040 } /* Keyword.Type */\n",
       "body .m { color: #666666 } /* Literal.Number */\n",
       "body .s { color: #BA2121 } /* Literal.String */\n",
       "body .na { color: #7D9029 } /* Name.Attribute */\n",
       "body .nb { color: #008000 } /* Name.Builtin */\n",
       "body .nc { color: #0000FF; font-weight: bold } /* Name.Class */\n",
       "body .no { color: #880000 } /* Name.Constant */\n",
       "body .nd { color: #AA22FF } /* Name.Decorator */\n",
       "body .ni { color: #999999; font-weight: bold } /* Name.Entity */\n",
       "body .ne { color: #D2413A; font-weight: bold } /* Name.Exception */\n",
       "body .nf { color: #0000FF } /* Name.Function */\n",
       "body .nl { color: #A0A000 } /* Name.Label */\n",
       "body .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */\n",
       "body .nt { color: #008000; font-weight: bold } /* Name.Tag */\n",
       "body .nv { color: #19177C } /* Name.Variable */\n",
       "body .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */\n",
       "body .w { color: #bbbbbb } /* Text.Whitespace */\n",
       "body .mb { color: #666666 } /* Literal.Number.Bin */\n",
       "body .mf { color: #666666 } /* Literal.Number.Float */\n",
       "body .mh { color: #666666 } /* Literal.Number.Hex */\n",
       "body .mi { color: #666666 } /* Literal.Number.Integer */\n",
       "body .mo { color: #666666 } /* Literal.Number.Oct */\n",
       "body .sa { color: #BA2121 } /* Literal.String.Affix */\n",
       "body .sb { color: #BA2121 } /* Literal.String.Backtick */\n",
       "body .sc { color: #BA2121 } /* Literal.String.Char */\n",
       "body .dl { color: #BA2121 } /* Literal.String.Delimiter */\n",
       "body .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */\n",
       "body .s2 { color: #BA2121 } /* Literal.String.Double */\n",
       "body .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */\n",
       "body .sh { color: #BA2121 } /* Literal.String.Heredoc */\n",
       "body .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */\n",
       "body .sx { color: #008000 } /* Literal.String.Other */\n",
       "body .sr { color: #BB6688 } /* Literal.String.Regex */\n",
       "body .s1 { color: #BA2121 } /* Literal.String.Single */\n",
       "body .ss { color: #19177C } /* Literal.String.Symbol */\n",
       "body .bp { color: #008000 } /* Name.Builtin.Pseudo */\n",
       "body .fm { color: #0000FF } /* Name.Function.Magic */\n",
       "body .vc { color: #19177C } /* Name.Variable.Class */\n",
       "body .vg { color: #19177C } /* Name.Variable.Global */\n",
       "body .vi { color: #19177C } /* Name.Variable.Instance */\n",
       "body .vm { color: #19177C } /* Name.Variable.Magic */\n",
       "body .il { color: #666666 } /* Literal.Number.Integer.Long */\n",
       "\n",
       "  </style>\n",
       "</head>\n",
       "<body>\n",
       "<h2></h2>\n",
       "\n",
       "<div class=\"highlight\"><pre><span></span><span class=\"k\">def</span> <span class=\"nf\">NeuralNetLearner</span><span class=\"p\">(</span><span class=\"n\">dataset</span><span class=\"p\">,</span> <span class=\"n\">hidden_layer_sizes</span><span class=\"o\">=</span><span class=\"bp\">None</span><span class=\"p\">,</span>\n",
       "                     <span class=\"n\">learning_rate</span><span class=\"o\">=</span><span class=\"mf\">0.01</span><span class=\"p\">,</span> <span class=\"n\">epochs</span><span class=\"o\">=</span><span class=\"mi\">100</span><span class=\"p\">,</span> <span class=\"n\">activation</span> <span class=\"o\">=</span> <span class=\"n\">sigmoid</span><span class=\"p\">):</span>\n",
       "    <span class=\"sd\">&quot;&quot;&quot;Layered feed-forward network.</span>\n",
       "<span class=\"sd\">    hidden_layer_sizes: List of number of hidden units per hidden layer</span>\n",
       "<span class=\"sd\">    learning_rate: Learning rate of gradient descent</span>\n",
       "<span class=\"sd\">    epochs: Number of passes over the dataset</span>\n",
       "<span class=\"sd\">    &quot;&quot;&quot;</span>\n",
       "\n",
       "    <span class=\"n\">hidden_layer_sizes</span> <span class=\"o\">=</span> <span class=\"n\">hidden_layer_sizes</span> <span class=\"ow\">or</span> <span class=\"p\">[</span><span class=\"mi\">3</span><span class=\"p\">]</span>  <span class=\"c1\"># default value</span>\n",
       "    <span class=\"n\">i_units</span> <span class=\"o\">=</span> <span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">dataset</span><span class=\"o\">.</span><span class=\"n\">inputs</span><span class=\"p\">)</span>\n",
       "    <span class=\"n\">o_units</span> <span class=\"o\">=</span> <span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">dataset</span><span class=\"o\">.</span><span class=\"n\">values</span><span class=\"p\">[</span><span class=\"n\">dataset</span><span class=\"o\">.</span><span class=\"n\">target</span><span class=\"p\">])</span>\n",
       "\n",
       "    <span class=\"c1\"># construct a network</span>\n",
       "    <span class=\"n\">raw_net</span> <span class=\"o\">=</span> <span class=\"n\">network</span><span class=\"p\">(</span><span class=\"n\">i_units</span><span class=\"p\">,</span> <span class=\"n\">hidden_layer_sizes</span><span class=\"p\">,</span> <span class=\"n\">o_units</span><span class=\"p\">,</span> <span class=\"n\">activation</span><span class=\"p\">)</span>\n",
       "    <span class=\"n\">learned_net</span> <span class=\"o\">=</span> <span class=\"n\">BackPropagationLearner</span><span class=\"p\">(</span><span class=\"n\">dataset</span><span class=\"p\">,</span> <span class=\"n\">raw_net</span><span class=\"p\">,</span>\n",
       "                                         <span class=\"n\">learning_rate</span><span class=\"p\">,</span> <span class=\"n\">epochs</span><span class=\"p\">,</span> <span class=\"n\">activation</span><span class=\"p\">)</span>\n",
       "\n",
       "    <span class=\"k\">def</span> <span class=\"nf\">predict</span><span class=\"p\">(</span><span class=\"n\">example</span><span class=\"p\">):</span>\n",
       "        <span class=\"c1\"># Input nodes</span>\n",
       "        <span class=\"n\">i_nodes</span> <span class=\"o\">=</span> <span class=\"n\">learned_net</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">]</span>\n",
       "\n",
       "        <span class=\"c1\"># Activate input layer</span>\n",
       "        <span class=\"k\">for</span> <span class=\"n\">v</span><span class=\"p\">,</span> <span class=\"n\">n</span> <span class=\"ow\">in</span> <span class=\"nb\">zip</span><span class=\"p\">(</span><span class=\"n\">example</span><span class=\"p\">,</span> <span class=\"n\">i_nodes</span><span class=\"p\">):</span>\n",
       "            <span class=\"n\">n</span><span class=\"o\">.</span><span class=\"n\">value</span> <span class=\"o\">=</span> <span class=\"n\">v</span>\n",
       "\n",
       "        <span class=\"c1\"># Forward pass</span>\n",
       "        <span class=\"k\">for</span> <span class=\"n\">layer</span> <span class=\"ow\">in</span> <span class=\"n\">learned_net</span><span class=\"p\">[</span><span class=\"mi\">1</span><span class=\"p\">:]:</span>\n",
       "            <span class=\"k\">for</span> <span class=\"n\">node</span> <span class=\"ow\">in</span> <span class=\"n\">layer</span><span class=\"p\">:</span>\n",
       "                <span class=\"n\">inc</span> <span class=\"o\">=</span> <span class=\"p\">[</span><span class=\"n\">n</span><span class=\"o\">.</span><span class=\"n\">value</span> <span class=\"k\">for</span> <span class=\"n\">n</span> <span class=\"ow\">in</span> <span class=\"n\">node</span><span class=\"o\">.</span><span class=\"n\">inputs</span><span class=\"p\">]</span>\n",
       "                <span class=\"n\">in_val</span> <span class=\"o\">=</span> <span class=\"n\">dotproduct</span><span class=\"p\">(</span><span class=\"n\">inc</span><span class=\"p\">,</span> <span class=\"n\">node</span><span class=\"o\">.</span><span class=\"n\">weights</span><span class=\"p\">)</span>\n",
       "                <span class=\"n\">node</span><span class=\"o\">.</span><span class=\"n\">value</span> <span class=\"o\">=</span> <span class=\"n\">node</span><span class=\"o\">.</span><span class=\"n\">activation</span><span class=\"p\">(</span><span class=\"n\">in_val</span><span class=\"p\">)</span>\n",
       "\n",
       "        <span class=\"c1\"># Hypothesis</span>\n",
       "        <span class=\"n\">o_nodes</span> <span class=\"o\">=</span> <span class=\"n\">learned_net</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]</span>\n",
       "        <span class=\"n\">prediction</span> <span class=\"o\">=</span> <span class=\"n\">find_max_node</span><span class=\"p\">(</span><span class=\"n\">o_nodes</span><span class=\"p\">)</span>\n",
       "        <span class=\"k\">return</span> <span class=\"n\">prediction</span>\n",
       "\n",
       "    <span class=\"k\">return</span> <span class=\"n\">predict</span>\n",
       "</pre></div>\n",
       "</body>\n",
       "</html>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "psource(NeuralNetLearner)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## BACKPROPAGATION\n",
    "\n",
    "### Overview\n",
    "\n",
    "In both the Perceptron and the Neural Network, we are using the Backpropagation algorithm to train our model by updating the weights. This is achieved by propagating the errors from our last layer (output layer) back to our first layer (input layer), this is why it is called Backpropagation. In order to use Backpropagation, we need a cost function. This function is responsible for indicating how good our neural network is for a given example. One common cost function is the *Mean Squared Error* (MSE). This cost function has the following format:\n",
    "\n",
    "$$MSE=\\frac{1}{n} \\sum_{i=1}^{n}(y - \\hat{y})^{2}$$\n",
    "\n",
    "Where `n` is the number of training examples, $\\hat{y}$ is our prediction and $y$ is the correct prediction for the example.\n",
    "\n",
    "The algorithm combines the concept of partial derivatives and the chain rule to generate the gradient for each weight in the network based on the cost function.\n",
    "\n",
    "For example, if we are using a Neural Network with three layers, the sigmoid function as our activation function and the MSE cost function, we want to find the gradient for the a given weight $w_{j}$, we can compute it like this:\n",
    "\n",
    "$$\\frac{\\partial MSE(\\hat{y}, y)}{\\partial w_{j}} = \\frac{\\partial MSE(\\hat{y}, y)}{\\partial \\hat{y}}\\times\\frac{\\partial\\hat{y}(in_{j})}{\\partial in_{j}}\\times\\frac{\\partial in_{j}}{\\partial w_{j}}$$\n",
    "\n",
    "Solving this equation, we have:\n",
    "\n",
    "$$\\frac{\\partial MSE(\\hat{y}, y)}{\\partial w_{j}} = (\\hat{y} - y)\\times{\\hat{y}}'(in_{j})\\times a_{j}$$\n",
    "\n",
    "Remember that $\\hat{y}$ is the activation function applied to a neuron in our hidden layer, therefore $$\\hat{y} = sigmoid(\\sum_{i=1}^{num\\_neurons}w_{i}\\times a_{i})$$\n",
    "\n",
    "Also $a$ is the input generated by feeding the input layer variables into the hidden layer.\n",
    "\n",
    "We can use the same technique for the weights in the input layer as well. After we have the gradients for both weights, we use gradient descent to update the weights of the network."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pseudocode"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "### AIMA3e\n",
       "__function__ BACK-PROP-LEARNING(_examples_, _network_) __returns__ a neural network  \n",
       "&emsp;__inputs__ _examples_, a set of examples, each with input vector __x__ and output vector __y__  \n",
       "&emsp;&emsp;&emsp;&emsp;_network_, a multilayer network with _L_ layers, weights _w<sub>i,j</sub>_, activation function _g_  \n",
       "&emsp;__local variables__: &Delta;, a vector of errors, indexed by network node  \n",
       "\n",
       "&emsp;__repeat__  \n",
       "&emsp;&emsp;&emsp;__for each__ weight _w<sub>i,j</sub>_ in _network_ __do__  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;_w<sub>i,j</sub>_ &larr; a small random number  \n",
       "&emsp;&emsp;&emsp;__for each__ example (__x__, __y__) __in__ _examples_ __do__  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;/\\* _Propagate the inputs forward to compute the outputs_ \\*/  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;__for each__ node _i_ in the input layer __do__  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;_a<sub>i</sub>_ &larr; _x<sub>i</sub>_  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;__for__ _l_ = 2 __to__ _L_ __do__  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;__for each__ node _j_ in layer _l_ __do__  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;_in<sub>j</sub>_ &larr; &Sigma;<sub>_i_</sub> _w<sub>i,j</sub>_ _a<sub>i</sub>_  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;_a<sub>j</sub>_ &larr; _g_(_in<sub>j</sub>_)  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;/\\* _Propagate deltas backward from output layer to input layer_ \\*/  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;__for each__ node _j_ in the output layer __do__  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&Delta;\\[_j_\\] &larr; _g_&prime;(_in<sub>j</sub>_) &times; (_y<sub>i</sub>_ &minus; _a<sub>j</sub>_)  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;__for__ _l_ = _L_ &minus; 1 __to__ 1 __do__  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;__for each__ node _i_ in layer _l_ __do__  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&Delta;\\[_i_\\] &larr; _g_&prime;(_in<sub>i</sub>_) &Sigma;<sub>_j_</sub> _w<sub>i,j</sub>_ &Delta;\\[_j_\\]  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;/\\* _Update every weight in network using deltas_ \\*/  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;__for each__ weight _w<sub>i,j</sub>_ in _network_ __do__  \n",
       "&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;_w<sub>i,j</sub>_ &larr; _w<sub>i,j</sub>_ &plus; _&alpha;_ &times; _a<sub>i</sub>_ &times; &Delta;\\[_j_\\]  \n",
       " &emsp;__until__ some stopping criterion is satisfied  \n",
       " &emsp;__return__ _network_  \n",
       "\n",
       "---\n",
       "__Figure ??__ The back\\-propagation algorithm for learning in multilayer networks."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pseudocode('Back-Prop-Learning')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Implementation\n",
    "\n",
    "First, we feed-forward the examples in our neural network. After that, we calculate the gradient for each layers' weights by using the chain rule. Once that is complete, we update all the weights using gradient descent. After running these for a given number of epochs, the function returns the trained Neural Network."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\"\n",
       "   \"http://www.w3.org/TR/html4/strict.dtd\">\n",
       "\n",
       "<html>\n",
       "<head>\n",
       "  <title></title>\n",
       "  <meta http-equiv=\"content-type\" content=\"text/html; charset=None\">\n",
       "  <style type=\"text/css\">\n",
       "td.linenos { background-color: #f0f0f0; padding-right: 10px; }\n",
       "span.lineno { background-color: #f0f0f0; padding: 0 5px 0 5px; }\n",
       "pre { line-height: 125%; }\n",
       "body .hll { background-color: #ffffcc }\n",
       "body  { background: #f8f8f8; }\n",
       "body .c { color: #408080; font-style: italic } /* Comment */\n",
       "body .err { border: 1px solid #FF0000 } /* Error */\n",
       "body .k { color: #008000; font-weight: bold } /* Keyword */\n",
       "body .o { color: #666666 } /* Operator */\n",
       "body .ch { color: #408080; font-style: italic } /* Comment.Hashbang */\n",
       "body .cm { color: #408080; font-style: italic } /* Comment.Multiline */\n",
       "body .cp { color: #BC7A00 } /* Comment.Preproc */\n",
       "body .cpf { color: #408080; font-style: italic } /* Comment.PreprocFile */\n",
       "body .c1 { color: #408080; font-style: italic } /* Comment.Single */\n",
       "body .cs { color: #408080; font-style: italic } /* Comment.Special */\n",
       "body .gd { color: #A00000 } /* Generic.Deleted */\n",
       "body .ge { font-style: italic } /* Generic.Emph */\n",
       "body .gr { color: #FF0000 } /* Generic.Error */\n",
       "body .gh { color: #000080; font-weight: bold } /* Generic.Heading */\n",
       "body .gi { color: #00A000 } /* Generic.Inserted */\n",
       "body .go { color: #888888 } /* Generic.Output */\n",
       "body .gp { color: #000080; font-weight: bold } /* Generic.Prompt */\n",
       "body .gs { font-weight: bold } /* Generic.Strong */\n",
       "body .gu { color: #800080; font-weight: bold } /* Generic.Subheading */\n",
       "body .gt { color: #0044DD } /* Generic.Traceback */\n",
       "body .kc { color: #008000; font-weight: bold } /* Keyword.Constant */\n",
       "body .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */\n",
       "body .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */\n",
       "body .kp { color: #008000 } /* Keyword.Pseudo */\n",
       "body .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */\n",
       "body .kt { color: #B00040 } /* Keyword.Type */\n",
       "body .m { color: #666666 } /* Literal.Number */\n",
       "body .s { color: #BA2121 } /* Literal.String */\n",
       "body .na { color: #7D9029 } /* Name.Attribute */\n",
       "body .nb { color: #008000 } /* Name.Builtin */\n",
       "body .nc { color: #0000FF; font-weight: bold } /* Name.Class */\n",
       "body .no { color: #880000 } /* Name.Constant */\n",
       "body .nd { color: #AA22FF } /* Name.Decorator */\n",
       "body .ni { color: #999999; font-weight: bold } /* Name.Entity */\n",
       "body .ne { color: #D2413A; font-weight: bold } /* Name.Exception */\n",
       "body .nf { color: #0000FF } /* Name.Function */\n",
       "body .nl { color: #A0A000 } /* Name.Label */\n",
       "body .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */\n",
       "body .nt { color: #008000; font-weight: bold } /* Name.Tag */\n",
       "body .nv { color: #19177C } /* Name.Variable */\n",
       "body .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */\n",
       "body .w { color: #bbbbbb } /* Text.Whitespace */\n",
       "body .mb { color: #666666 } /* Literal.Number.Bin */\n",
       "body .mf { color: #666666 } /* Literal.Number.Float */\n",
       "body .mh { color: #666666 } /* Literal.Number.Hex */\n",
       "body .mi { color: #666666 } /* Literal.Number.Integer */\n",
       "body .mo { color: #666666 } /* Literal.Number.Oct */\n",
       "body .sa { color: #BA2121 } /* Literal.String.Affix */\n",
       "body .sb { color: #BA2121 } /* Literal.String.Backtick */\n",
       "body .sc { color: #BA2121 } /* Literal.String.Char */\n",
       "body .dl { color: #BA2121 } /* Literal.String.Delimiter */\n",
       "body .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */\n",
       "body .s2 { color: #BA2121 } /* Literal.String.Double */\n",
       "body .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */\n",
       "body .sh { color: #BA2121 } /* Literal.String.Heredoc */\n",
       "body .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */\n",
       "body .sx { color: #008000 } /* Literal.String.Other */\n",
       "body .sr { color: #BB6688 } /* Literal.String.Regex */\n",
       "body .s1 { color: #BA2121 } /* Literal.String.Single */\n",
       "body .ss { color: #19177C } /* Literal.String.Symbol */\n",
       "body .bp { color: #008000 } /* Name.Builtin.Pseudo */\n",
       "body .fm { color: #0000FF } /* Name.Function.Magic */\n",
       "body .vc { color: #19177C } /* Name.Variable.Class */\n",
       "body .vg { color: #19177C } /* Name.Variable.Global */\n",
       "body .vi { color: #19177C } /* Name.Variable.Instance */\n",
       "body .vm { color: #19177C } /* Name.Variable.Magic */\n",
       "body .il { color: #666666 } /* Literal.Number.Integer.Long */\n",
       "\n",
       "  </style>\n",
       "</head>\n",
       "<body>\n",
       "<h2></h2>\n",
       "\n",
       "<div class=\"highlight\"><pre><span></span><span class=\"k\">def</span> <span class=\"nf\">BackPropagationLearner</span><span class=\"p\">(</span><span class=\"n\">dataset</span><span class=\"p\">,</span> <span class=\"n\">net</span><span class=\"p\">,</span> <span class=\"n\">learning_rate</span><span class=\"p\">,</span> <span class=\"n\">epochs</span><span class=\"p\">,</span> <span class=\"n\">activation</span><span class=\"o\">=</span><span class=\"n\">sigmoid</span><span class=\"p\">):</span>\n",
       "    <span class=\"sd\">&quot;&quot;&quot;[Figure 18.23] The back-propagation algorithm for multilayer networks&quot;&quot;&quot;</span>\n",
       "    <span class=\"c1\"># Initialise weights</span>\n",
       "    <span class=\"k\">for</span> <span class=\"n\">layer</span> <span class=\"ow\">in</span> <span class=\"n\">net</span><span class=\"p\">:</span>\n",
       "        <span class=\"k\">for</span> <span class=\"n\">node</span> <span class=\"ow\">in</span> <span class=\"n\">layer</span><span class=\"p\">:</span>\n",
       "            <span class=\"n\">node</span><span class=\"o\">.</span><span class=\"n\">weights</span> <span class=\"o\">=</span> <span class=\"n\">random_weights</span><span class=\"p\">(</span><span class=\"n\">min_value</span><span class=\"o\">=-</span><span class=\"mf\">0.5</span><span class=\"p\">,</span> <span class=\"n\">max_value</span><span class=\"o\">=</span><span class=\"mf\">0.5</span><span class=\"p\">,</span>\n",
       "                                          <span class=\"n\">num_weights</span><span class=\"o\">=</span><span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">node</span><span class=\"o\">.</span><span class=\"n\">weights</span><span class=\"p\">))</span>\n",
       "\n",
       "    <span class=\"n\">examples</span> <span class=\"o\">=</span> <span class=\"n\">dataset</span><span class=\"o\">.</span><span class=\"n\">examples</span>\n",
       "    <span class=\"sd\">&#39;&#39;&#39;</span>\n",
       "<span class=\"sd\">    As of now dataset.target gives an int instead of list,</span>\n",
       "<span class=\"sd\">    Changing dataset class will have effect on all the learners.</span>\n",
       "<span class=\"sd\">    Will be taken care of later.</span>\n",
       "<span class=\"sd\">    &#39;&#39;&#39;</span>\n",
       "    <span class=\"n\">o_nodes</span> <span class=\"o\">=</span> <span class=\"n\">net</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]</span>\n",
       "    <span class=\"n\">i_nodes</span> <span class=\"o\">=</span> <span class=\"n\">net</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">]</span>\n",
       "    <span class=\"n\">o_units</span> <span class=\"o\">=</span> <span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">o_nodes</span><span class=\"p\">)</span>\n",
       "    <span class=\"n\">idx_t</span> <span class=\"o\">=</span> <span class=\"n\">dataset</span><span class=\"o\">.</span><span class=\"n\">target</span>\n",
       "    <span class=\"n\">idx_i</span> <span class=\"o\">=</span> <span class=\"n\">dataset</span><span class=\"o\">.</span><span class=\"n\">inputs</span>\n",
       "    <span class=\"n\">n_layers</span> <span class=\"o\">=</span> <span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">net</span><span class=\"p\">)</span>\n",
       "\n",
       "    <span class=\"n\">inputs</span><span class=\"p\">,</span> <span class=\"n\">targets</span> <span class=\"o\">=</span> <span class=\"n\">init_examples</span><span class=\"p\">(</span><span class=\"n\">examples</span><span class=\"p\">,</span> <span class=\"n\">idx_i</span><span class=\"p\">,</span> <span class=\"n\">idx_t</span><span class=\"p\">,</span> <span class=\"n\">o_units</span><span class=\"p\">)</span>\n",
       "\n",
       "    <span class=\"k\">for</span> <span class=\"n\">epoch</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"n\">epochs</span><span class=\"p\">):</span>\n",
       "        <span class=\"c1\"># Iterate over each example</span>\n",
       "        <span class=\"k\">for</span> <span class=\"n\">e</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">examples</span><span class=\"p\">)):</span>\n",
       "            <span class=\"n\">i_val</span> <span class=\"o\">=</span> <span class=\"n\">inputs</span><span class=\"p\">[</span><span class=\"n\">e</span><span class=\"p\">]</span>\n",
       "            <span class=\"n\">t_val</span> <span class=\"o\">=</span> <span class=\"n\">targets</span><span class=\"p\">[</span><span class=\"n\">e</span><span class=\"p\">]</span>\n",
       "\n",
       "            <span class=\"c1\"># Activate input layer</span>\n",
       "            <span class=\"k\">for</span> <span class=\"n\">v</span><span class=\"p\">,</span> <span class=\"n\">n</span> <span class=\"ow\">in</span> <span class=\"nb\">zip</span><span class=\"p\">(</span><span class=\"n\">i_val</span><span class=\"p\">,</span> <span class=\"n\">i_nodes</span><span class=\"p\">):</span>\n",
       "                <span class=\"n\">n</span><span class=\"o\">.</span><span class=\"n\">value</span> <span class=\"o\">=</span> <span class=\"n\">v</span>\n",
       "\n",
       "            <span class=\"c1\"># Forward pass</span>\n",
       "            <span class=\"k\">for</span> <span class=\"n\">layer</span> <span class=\"ow\">in</span> <span class=\"n\">net</span><span class=\"p\">[</span><span class=\"mi\">1</span><span class=\"p\">:]:</span>\n",
       "                <span class=\"k\">for</span> <span class=\"n\">node</span> <span class=\"ow\">in</span> <span class=\"n\">layer</span><span class=\"p\">:</span>\n",
       "                    <span class=\"n\">inc</span> <span class=\"o\">=</span> <span class=\"p\">[</span><span class=\"n\">n</span><span class=\"o\">.</span><span class=\"n\">value</span> <span class=\"k\">for</span> <span class=\"n\">n</span> <span class=\"ow\">in</span> <span class=\"n\">node</span><span class=\"o\">.</span><span class=\"n\">inputs</span><span class=\"p\">]</span>\n",
       "                    <span class=\"n\">in_val</span> <span class=\"o\">=</span> <span class=\"n\">dotproduct</span><span class=\"p\">(</span><span class=\"n\">inc</span><span class=\"p\">,</span> <span class=\"n\">node</span><span class=\"o\">.</span><span class=\"n\">weights</span><span class=\"p\">)</span>\n",
       "                    <span class=\"n\">node</span><span class=\"o\">.</span><span class=\"n\">value</span> <span class=\"o\">=</span> <span class=\"n\">node</span><span class=\"o\">.</span><span class=\"n\">activation</span><span class=\"p\">(</span><span class=\"n\">in_val</span><span class=\"p\">)</span>\n",
       "\n",
       "            <span class=\"c1\"># Initialize delta</span>\n",
       "            <span class=\"n\">delta</span> <span class=\"o\">=</span> <span class=\"p\">[[]</span> <span class=\"k\">for</span> <span class=\"n\">_</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"n\">n_layers</span><span class=\"p\">)]</span>\n",
       "\n",
       "            <span class=\"c1\"># Compute outer layer delta</span>\n",
       "\n",
       "            <span class=\"c1\"># Error for the MSE cost function</span>\n",
       "            <span class=\"n\">err</span> <span class=\"o\">=</span> <span class=\"p\">[</span><span class=\"n\">t_val</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">]</span> <span class=\"o\">-</span> <span class=\"n\">o_nodes</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">value</span> <span class=\"k\">for</span> <span class=\"n\">i</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"n\">o_units</span><span class=\"p\">)]</span>\n",
       "\n",
       "            <span class=\"c1\"># The activation function used is relu or sigmoid function</span>\n",
       "            <span class=\"k\">if</span> <span class=\"n\">node</span><span class=\"o\">.</span><span class=\"n\">activation</span> <span class=\"o\">==</span> <span class=\"n\">sigmoid</span><span class=\"p\">:</span>\n",
       "                <span class=\"n\">delta</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]</span> <span class=\"o\">=</span> <span class=\"p\">[</span><span class=\"n\">sigmoid_derivative</span><span class=\"p\">(</span><span class=\"n\">o_nodes</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">value</span><span class=\"p\">)</span> <span class=\"o\">*</span> <span class=\"n\">err</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">]</span> <span class=\"k\">for</span> <span class=\"n\">i</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"n\">o_units</span><span class=\"p\">)]</span>\n",
       "            <span class=\"k\">else</span><span class=\"p\">:</span>\n",
       "                <span class=\"n\">delta</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]</span> <span class=\"o\">=</span> <span class=\"p\">[</span><span class=\"n\">relu_derivative</span><span class=\"p\">(</span><span class=\"n\">o_nodes</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">value</span><span class=\"p\">)</span> <span class=\"o\">*</span> <span class=\"n\">err</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">]</span> <span class=\"k\">for</span> <span class=\"n\">i</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"n\">o_units</span><span class=\"p\">)]</span>\n",
       "\n",
       "            <span class=\"c1\"># Backward pass</span>\n",
       "            <span class=\"n\">h_layers</span> <span class=\"o\">=</span> <span class=\"n\">n_layers</span> <span class=\"o\">-</span> <span class=\"mi\">2</span>\n",
       "            <span class=\"k\">for</span> <span class=\"n\">i</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"n\">h_layers</span><span class=\"p\">,</span> <span class=\"mi\">0</span><span class=\"p\">,</span> <span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">):</span>\n",
       "                <span class=\"n\">layer</span> <span class=\"o\">=</span> <span class=\"n\">net</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">]</span>\n",
       "                <span class=\"n\">h_units</span> <span class=\"o\">=</span> <span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">layer</span><span class=\"p\">)</span>\n",
       "                <span class=\"n\">nx_layer</span> <span class=\"o\">=</span> <span class=\"n\">net</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">]</span>\n",
       "\n",
       "                <span class=\"c1\"># weights from each ith layer node to each i + 1th layer node</span>\n",
       "                <span class=\"n\">w</span> <span class=\"o\">=</span> <span class=\"p\">[[</span><span class=\"n\">node</span><span class=\"o\">.</span><span class=\"n\">weights</span><span class=\"p\">[</span><span class=\"n\">k</span><span class=\"p\">]</span> <span class=\"k\">for</span> <span class=\"n\">node</span> <span class=\"ow\">in</span> <span class=\"n\">nx_layer</span><span class=\"p\">]</span> <span class=\"k\">for</span> <span class=\"n\">k</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"n\">h_units</span><span class=\"p\">)]</span>\n",
       "\n",
       "                <span class=\"k\">if</span> <span class=\"n\">activation</span> <span class=\"o\">==</span> <span class=\"n\">sigmoid</span><span class=\"p\">:</span>\n",
       "                    <span class=\"n\">delta</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">]</span> <span class=\"o\">=</span> <span class=\"p\">[</span><span class=\"n\">sigmoid_derivative</span><span class=\"p\">(</span><span class=\"n\">layer</span><span class=\"p\">[</span><span class=\"n\">j</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">value</span><span class=\"p\">)</span> <span class=\"o\">*</span> <span class=\"n\">dotproduct</span><span class=\"p\">(</span><span class=\"n\">w</span><span class=\"p\">[</span><span class=\"n\">j</span><span class=\"p\">],</span> <span class=\"n\">delta</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">])</span>\n",
       "                            <span class=\"k\">for</span> <span class=\"n\">j</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"n\">h_units</span><span class=\"p\">)]</span>\n",
       "                <span class=\"k\">else</span><span class=\"p\">:</span>\n",
       "                    <span class=\"n\">delta</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">]</span> <span class=\"o\">=</span> <span class=\"p\">[</span><span class=\"n\">relu_derivative</span><span class=\"p\">(</span><span class=\"n\">layer</span><span class=\"p\">[</span><span class=\"n\">j</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">value</span><span class=\"p\">)</span> <span class=\"o\">*</span> <span class=\"n\">dotproduct</span><span class=\"p\">(</span><span class=\"n\">w</span><span class=\"p\">[</span><span class=\"n\">j</span><span class=\"p\">],</span> <span class=\"n\">delta</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">])</span>\n",
       "                            <span class=\"k\">for</span> <span class=\"n\">j</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"n\">h_units</span><span class=\"p\">)]</span>\n",
       "\n",
       "            <span class=\"c1\">#  Update weights</span>\n",
       "            <span class=\"k\">for</span> <span class=\"n\">i</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"mi\">1</span><span class=\"p\">,</span> <span class=\"n\">n_layers</span><span class=\"p\">):</span>\n",
       "                <span class=\"n\">layer</span> <span class=\"o\">=</span> <span class=\"n\">net</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">]</span>\n",
       "                <span class=\"n\">inc</span> <span class=\"o\">=</span> <span class=\"p\">[</span><span class=\"n\">node</span><span class=\"o\">.</span><span class=\"n\">value</span> <span class=\"k\">for</span> <span class=\"n\">node</span> <span class=\"ow\">in</span> <span class=\"n\">net</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">]]</span>\n",
       "                <span class=\"n\">units</span> <span class=\"o\">=</span> <span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">layer</span><span class=\"p\">)</span>\n",
       "                <span class=\"k\">for</span> <span class=\"n\">j</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"n\">units</span><span class=\"p\">):</span>\n",
       "                    <span class=\"n\">layer</span><span class=\"p\">[</span><span class=\"n\">j</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">weights</span> <span class=\"o\">=</span> <span class=\"n\">vector_add</span><span class=\"p\">(</span><span class=\"n\">layer</span><span class=\"p\">[</span><span class=\"n\">j</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">weights</span><span class=\"p\">,</span>\n",
       "                                                  <span class=\"n\">scalar_vector_product</span><span class=\"p\">(</span>\n",
       "                                                  <span class=\"n\">learning_rate</span> <span class=\"o\">*</span> <span class=\"n\">delta</span><span class=\"p\">[</span><span class=\"n\">i</span><span class=\"p\">][</span><span class=\"n\">j</span><span class=\"p\">],</span> <span class=\"n\">inc</span><span class=\"p\">))</span>\n",
       "\n",
       "    <span class=\"k\">return</span> <span class=\"n\">net</span>\n",
       "</pre></div>\n",
       "</body>\n",
       "</html>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "psource(BackPropagationLearner)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n"
     ]
    }
   ],
   "source": [
    "iris = DataSet(name=\"iris\")\n",
    "iris.classes_to_numbers()\n",
    "\n",
    "nNL = NeuralNetLearner(iris)\n",
    "print(nNL([5, 3, 1, 0.1]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The output should be 0, which means the item should get classified in the first class, \"setosa\". Note that since the algorithm is non-deterministic (because of the random initial weights) the classification might be wrong. Usually though, it should be correct.\n",
    "\n",
    "To increase accuracy, you can (most of the time) add more layers and nodes. Unfortunately, increasing the number of layers or nodes also increases the computation cost and might result in overfitting.\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "source": [],
    "metadata": {
     "collapsed": false
    }
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}